Speech Rhythm and Rhythmic Taxonomy
نویسنده
چکیده
Of all prosodic variables used to classify languages, rhythm has proved most problematic. Recent attempts to classify languages based on the relative proportion of vowels or obstruents have had some success, but these seem only indirectly related to perceived rhythm. Coupling between nested prosodic units is identified as an additional source of rhythmic patterning in speech, and this coupling is claimed to be gradient and highly variable, dependent on speaker characteristics and text properties. Experimental results which illustrate several degrees of coupling between different prosodic levels are presented, both from previous work within the Speech Cycling paradigm, and from new data. A satisfactory account of speech rhythm will have to take both language-specific phonological properties and utterance-specific coupling among nested production units into account. 1. On Classification and Taxonomy Taxonomy involves the determination of discrete classes. In its classical manifestation, living forms are divided into discrete groups (species, genera, families, etc), and criteria are established which help to decide which taxon a given exemplar should be assigned to. A basic assumption is that discrete classes exist underlyingly, and that a strict classification is, in principle, possible. In this regard it differs from the more general practice of biosystematics, which considers any and all relationships which exist among organisms. The data on which a classification is made may, of course, be insufficient to allow unambiguous classification of a given exemplar. By way of a simple example, we might consider a simple racially homogeneous population of men and women, in which mens’ heights are normally distributed around a given mean (say 2m) with a certain standard deviation (say 0.5m), while womens’ heights are similarly distributed around a different mean (say 1.8m). Based only on a measure of height from an individual, we can only provide a probabilistic classification. Nonetheless, there is assumed to be a underlying discrete difference between the classes. There are many forms of linguistic taxonomy, most of which have the property that we have strong reason to suspect a discrete difference in some formal feature between the languages. For example, some languages have a basic word order in which the subject is ordered before the verb, which in turn precedes the object, while others order these three elements differently. Taxonomic licence is granted because of the discrete nature of the elements involved. 2. Prosody as a Basis for Taxonomy Prosody has often been used as a basis for classifying languages. The grab bag of phenomena which can be linked under the label “prosody” leaves considerable scope for creative classification. Attempts have been made to classify languages based on stress, accent, intonation, lexical and morphological tone, and, of course, rhythm. However, it has not always been possible to unambiguously identify discrete elements corresponding to each of these dimensions with the same robustness as in the segmental, morphological or lexical domains. Distinctions based on syllable structure have been fairly uncontroversial, as a segmental inventory is relatively easy to obtain for a given language, and the principles of syllable structure have shown considerable generality. Linguistic theories such as Autosegmental Phonology or Optimality Theory have provided well-founded and empirically supported theories of underlying discrete structures which permit classifications within and across languages. Distinctions based on fundamental frequency have had mixed success. On the one hand, one can identify languages which make use of lexical tone (e.g. Mandarin) and others which do not (e.g. English). Intermediate cases do exist (e.g. some dialects of Korean), but these are usually considered to represent transitional states of the language from one class to the other. The morphological use of tone familiar from the Niger-Congo languages of Africa represents another welldefined class. On the other hand, phenomena related to phrasal accents and phrasal intonation have proved less obviously amenable to a conventional linguistic treatment. To be sure, there are several theories of phrasal intonation which relate observed pitch contours to a discrete set of underlying linguistic elements [16], however agreement among theories as to the nature and count of such elements has been hard to arrive at. The situation is further complicated by the many non-linguistic roles of intonation, such as in adding emphasis or expressive variation. Several studies have demonstrated gradient rather than categorical phenomena here [11, 10]. But nowhere has the effort at establishing and defending a prosodic taxonomy had a harder time than in the domain of ’rhythm’. Without doubt, much of this lack of progress can be traced to differing interpretations of the term ’rhythm’. It will be a contention of this paper that at least two independent dimensions have been called to service in characterizing rhythm. One of these is related to syllable structure and segmental inventories, and may therefore offer the basis for a taxonSpeech Prosody 2002 Aix-en-Provence, France April 11-13, 2002 ISCA Archive http://www.isca-speech.org/archive omy. The other relates to a gradient phenomenon, not yet well understood, which mediates the role of syllables in determining macroscopic timing patterns. Its gradient nature precludes it from supporting a classification among languages. Furthermore, it will be claimed, pre-theoretical perceptions of rhythm (whether characteristic of a speaker or a language) are derived from an interplay between the discrete and the gradient phenomena. 3. Where is Rhythm in Speech? 3.1. Rhythm across languages Our formal approaches to characterizing rhythm in speech are grounded in a pre-theoretical perception of a patterning in time which speech and music have, to some degree, in common. We become aware of something like rhythmic properties in speech when we contrast speech in different languages, and this is presumably the reason why rhythm has so-often been called upon to support language classification. The ability to distinguish among languages based on a signal which preserves low frequency information has been documented in infants [13], while Ramus demonstrated a similar ability in adults using resynthesized speech in which segments were stripped of their identity, but not their broad phonetic class [17]. Many attempts have been made to identify a basis for this apparent perception of a rhythmic difference among languages. Simplistic notions based on isochronous units have been uniformly rejected [5]. Two current influential models [18, 9] take up a suggestion by Dauer [5] that languages may lie along a continuum (or in a continuous space), certain points of which have previously been identified with rhythmic classes (syllable-, stressand mora-timed languages). They each develop continuous measures which can support clustering of languages in accordance with older taxonomic divisions. Since the introduction of the notion of gradient rhythmic qualities, it is no longer entirely clear that a taxonomy is being sought, as opposed to a more general systematic description of variation among languages. Ramus et al. [18] arrive at two (correlated) variables, defined over an utterance: the proportion of vocalic intervals (%V) and the standard deviation of the duration of consonantal intervals ( C). Both of these measures will be directly influenced by the segmental inventory and the phonotactic regularities of a specific language. That is, any classification based on these variables can be related to an underlying discrete system, and so true classification is, in principle, possible. Grabe and Low [9] relate rhythmic diversity to serial variability in (a) the inter-vowel-onset interval and (b) the interval between one vowel offset and the following onset. As with the previous measures, these two variables are not entirely independent, and their distributions will be dictated largely by the segmental inventory and phonotactics of a given language. Similar results have recently been suggested based on a sonority measure which captures the degree of obstruency in the signal [8]. Collectively these variables may be compared to alternative measures on our hypothetical population from Section 1: had we measured weight, or hair length, instead of height, we would likewise have found a bi-modal distribution, with the same underlying cause. 3.2. Rhythm within speaker There is another, distinct, sense in which speech is rhythmical, and this is related to fluency. As we speak, the fluency with which speech is generated varies continually. We are all familiar with both the ease with which fluent speech flows, and the debilitating effect of its opposite, the dysfluent event. This type of rhythm is considerably harder to quantify, as it can vary substantially within a single utterance, and is apparently subject to the vagaries of expression and rhetorical force as much as to language-specific constraints. Let the sentence presented by Abercrombie [1] as ’unambiguously’ illustrating the stress-timed nature of English serve as an example: “Which is the Train for Crewe please”. Abercrombie’s suggestion was that the reader tap along with the stresses while saying the sentence, and indeed, it is not difficult to speak this sentence with 4 roughly isochronous beats on the stressed syllables. However, any naturalistic rendition without the associated tapping will depart substantially from this regular pattern. Furthermore, a syllable-based timing can likewise be imposed on this sentence (think “angry, seething, passenger faced with unhelpful guides”). Depending on the communicative situation, the rate of speech, the degree of expression, etc, rather different timing patterns can overlay one and the same utterance, for a single speaker. Some of these are regular enough that we would want our definition of speech rhythm to extend to them and their like. However, these patterns will clearly not be of much help in establishing a cross-language taxonomy. This variability raises the question of whether the kind of index proposed by Ramus, Grabe and others can meaningfully be said to capture anything about rhythm in speech. The discrete basis for the suggested taxonomy can be argued to be grounded in segmental inventories and syllabic phonotactics, and can therefore be accounted for without reference to anything resembling the pre-theoretical notion of rhythm described at the start of this section. More succinctly, where is the bomdi-bom-bom in %V? The argument to be developed here is that there are indeed two distinct phenomena here, which interact to provide a perception of rhythm in speech. On the one hand, there are linguistic units which vary discretely across languages. Thus English has its heavy and light syllables, stresses, feet etc, while Japanese has its Morae, perhaps a bi-moraic foot, and so on. These are symbolic, linguistic entities familiar from phonology, and language taxa can be constructed on foot thereof. To some extent these alone dictate the alternation of light and heavy elements in spoken language, and so they contribute to the rhythmic signature of a language. These units also serve as participants in hierarchical timing relationships, in which smaller prosodic units are nested within larger units, and the degree of coupling between levels varies in gradient fashion, as dictated by fluency, conversational intent, urgency, etc. As coupling varies continually, so too does the perceived rhythmicity of speech, and, perhaps, perceived fluency, though this direct association has yet to be tested. The gradient coupling between prosodic levels (syllables within feet, feet within phrase, etc) has been identified and modelled before [15]. It has also been observed experimentally in the Speech Cycling paradigm [4, 19], in which subjects repeat a short phrase in time with an external metronome. Results from Speech Cycling experiments with English and Japanese speakers will now briefly be reviewed to see if they can illuminate the relationship between these two interacting sources of “rhythm”. 1Examples of particularly fluent speech exhibiting syllable-timed and stress-timed characteristics within an utterance by a single speaker are given at http://cspeech.ucd.ie/ fred/speechrhythm/speechrhythm.html. 2sorry. 4. Speech Cycling Results
منابع مشابه
Rhythm and meter in spoken-word perception 1 Running head: RHYTHM AND METER IN SPOKEN-WORD PERCEPTION Rhythmic regularity and metrical expectancy in spoken-word perception
Certain types of speech, e.g. lists of words or numbers, are usually spoken with a clear speech rhythm. Salient, stressed vowels are aligned to rhythmic points within the phrase period. The main hypothesis of this study (derived from the Dynamic Attending Theory; M.R. Jones (1976), Psych. Rev. 83, 323–355) is that listeners attend in particular to speech events at these rhythmic time points. Be...
متن کاملRepresenting Speech Rhythm
The issue of representing speech rhythm is understood in this paper as the search for relevant primary parameters that will allow the formalisation of speech rhythm. Current speech synthesisers show that phonological models are not satisfactory with respect to the modelling of speech rhythmicity. Our analysis indicates that this may be in part related to the formalisation of rhythmic representa...
متن کاملSpeech rate effects on speech rhythm
This acoustic study investigates speech rhythmic restructurings due to speech rate increase in Brazilian Portuguese. Rhythmic restructuring is considered here as the reorganization of stress groups due to speech rate increase. The Dynamical Speech Rhythm model was used as a theoretical background for the acoustic analyses. Main results have shown that speech rate increase reorganizes speech rhy...
متن کاملBridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.
Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which the...
متن کاملKorean speech rhythm using rhythmic measures
This paper investigates speech rhythm in standard Seoul Korean using recently developed durational rhythmic measures. Traditionally, languages are classified into 'stress-timed' and 'syllable-timed' (Abercrombie, 1967), with English and German being typical stress-timed languages, French and Italian typical syllable-timed languages. Later works suggested a third category of mora-timed languages...
متن کاملMonkey lipsmacking develops like the human speech rhythm.
Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved de novo in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial expressions. We tested this idea by investigating t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002